Hashing: Efficiency & Load Factor

PolyU DSAI2201 Lecture 13 2025-11-25

Defining the Load Factor ($\lambda$)

The Load Factor ($\lambda$) is the central metric determining the real-world efficiency of a hash table. It quantifies how full the table is, directly correlating to the probability and length of collisions.

$$ \lambda = \frac{\text{Number of Items Stored } (N)}{\text{Size of Table/Buckets } (M)} $$
  • Separate Chaining: $\lambda$ represents the average length of the linked lists (chains).
  • Open Addressing (Probing): $\lambda$ represents the table saturation, directly correlating to the expected number of probes required to find an empty slot or a specific item.

Load Factor & Complexity

To achieve the theoretical $O(1)$ average case performance for Search, Insert, and Delete operations, the Load Factor ($\lambda$) must be constant and generally small (e.g., $< 1.0$ or $< 0.75$).

Performance FactorAverage Case (Target $\lambda$)Worst Case (High $\lambda$)
Search/Insert Time$O(1 + \lambda) \approx O(1)$$O(N)$ (Approaches linear search)
Collision RateLow/ManageableHigh
Space Overhead$O(N + M)$$O(N + M)$

Key Implication: If $M$ is fixed and $N$ grows, $\lambda$ increases, degrading performance toward $O(N)$.

Strategy: Managing Saturation (Re-hashing)

To prevent performance degradation caused by high $\lambda$, efficient hash tables implement Re-hashing (resizing).

  1. Threshold Monitoring: The system monitors $\lambda$. If it exceeds a predefined threshold (e.g., 0.75 for Open Addressing, or 2.0 for Chaining), resizing is triggered.
  2. Expansion: The table size $M$ is increased (e.g., doubled to $M'$).
  3. Redistribution: All existing $N$ items are re-inserted into the new, larger table $M'$, using a potentially new hash function (mod $M'$).

Result: $\lambda$ immediately drops, restoring the structure's $O(1)$ average performance guarantee. Note that the re-hashing operation itself temporarily costs $O(N)$.

📝 Interactive Quiz

1. What is the primary purpose of re-hashing a hash table?

  • A) To sort the elements within the table.
  • B) To reduce the load factor and maintain $O(1)$ average performance.
  • C) To immediately free up unused memory slots.
  • D) To check for data corruption.

2. In a hash table using Separate Chaining, if $N=20$ items are stored in $M=10$ buckets, what does the load factor $\lambda=2$ represent?

  • A) The table is completely full and cannot accept new items.
  • B) The worst-case search time is exactly 2 comparisons.
  • C) The average length of the linked lists (chains) is 2.
  • D) At least one bucket must contain exactly 2 items.

3. A system uses an Open Addressing hash table with $M=10,000$ slots. It stores $N=1,500$ items. What is the expected average time complexity for a search?

  • A) $O(1)$
  • B) $O(\log N)$
  • C) $O(N)$
  • D) $O(\lambda)$

4. Although a single re-hash operation costs $O(N)$, why is the amortized cost of insertion still considered $O(1)$?

  • A) The high cost is infrequent and spread out over many cheap insertions.
  • B) Re-hashing only happens when the table is empty.
  • C) The $O(N)$ cost is a theoretical maximum that never actually occurs.
  • D) Modern CPUs can perform the re-hash in a single clock cycle.